Populating the interactive namespace from numpy and matplotlib

LASSO

Loss Function

$$L(\vec{x})=\frac{1}{2}(y_{i}-A^{*}_{ij}x_{j})(y_{i}-A_{ik}x_{k})+\frac{\lambda_2}{2}[(D^{1}_{ij} x_{j})(D^{1}_{ik}x_{k})+ (D^{2}_{ij} x_{j})(D^{2}_{ik}x_{k})]+ \lambda_1 \sigma_i |x_i|,$$

where $x_i$ is the i-th component of real vector $\vec{x}$, $A_{ij}$ and $D^{1,2}_{ij}$ are the $(i,j)$ component of matrix $\mathbf{A}$ and $\mathbf{D^{1,2}}$. $A_{ij}$ is complex number whereas $D^{1,2}_{ij}$ are real number. $A^{*}_{ij}$ is used to denote the conjugate of $A_{ij}$.

$\mathbf{A}$ refers to the transformation operator from the real dictionary field to the observed complex shear field. Note that we do not consider $B$ mode so $x$ is a real vector.

# sparseBase.massmapSparsityTask.__init__
## Initialization of the Dictionary Space (A)
# 1) Basis vector used to model the Surface Density of halo  
if dicname=='starlet':
    from halolet import starlet2D
    self.dict2D =   starlet2D(gen=2,nframe=self.nframe,ny=self.ny,nx=self.nx)
elif dicname=='nfwlet':
    from halolet import nfwlet2D
    self.dict2D =   nfwlet2D(nframe=self.nframe,ngrid=self.nx,smooth_scale=-1)
# 2) From Surface Density of Excess Surface Density
self.ks2D   =   massmap_ks2D(self.ny,self.nx)

# 3) Lening Kernel: from Excess Surface Density to Shear

if self.nlp<=1:
    self.lensKernel =   np.ones((self.nz,self.nlp))
else:
    self.lensKernel =   np.zeros((self.nz,self.nlp))
    for i,zs in enumerate(zsbin):
        self.lensKernel[i,:]    =   self.cosmo.deltacritinv(zlbin,zs)

$\mathbf{D^{1,2}}$ refer to difference operators in two transverse (ra, dec) directions. $\lambda_2$ control the amplitude of the Total Square Variance (TSV) term.

# sparseBase.massmapSparsityTask.gradient_TSV
# Definition of D1 and D2
difx    =   np.roll(self.alphaR[:,0,:,:],1,axis=-1)
difx    =   difx-self.alphaR[:,0,:,:] # D1
dify    =   np.roll(self.alphaR[:,0,:,:],1,axis=-2)
dify    =   dify-self.alphaR[:,0,:,:] # D2

Second Order Term

We separate out the second order terms in the loss function.

$$F(\vec{x})=\frac{1}{2}(y_{i}-A^{*}_{ij}x_{j})(y_{i}-A_{ik}x_{k})+\frac{\lambda_2}{2}[(D^{1}_{ij} x_{j})(D^{1}_{ik}x_{k})+ (D^{2}_{ij} x_{j})(D^{2}_{ik}x_{k})].$$

The gradient of it is

$$\partial_\alpha F(\vec{x})=-A_{i\alpha}^{*} (y_{i}-A_{ij}x_{j}) + \lambda_2(D^1_{i\alpha}D^1_{ij} x_{j}+ D^2_{i\alpha} D^2_{ij} x_j).$$
# sparseBase.massmapSparsityTask.gradient
# calculate the gradient of the Second order component in loss function 
# wihch includes Total Square Variance(TSV) and chi2 components
gCh2=self.gradient_chi2()
gTSV=self.gradient_TSV()

# sparseBase.massmapSparsityTask.gradient_chi2
# calculate the gradient of the chi2 component
shearRTmp   =   self.main_forward(self.alphaR) #A_{ij} x_j
self.shearRRes   =   self.shearR-shearRTmp     #y_i-A_{ij} x_j
dalphaR     =   -self.main_transpose(self.shearRRes)/2. #-A_{i\alpha}(y_i-A_{ij} x_j)/2

# sparseBase.massmapSparsityTask.gradient_TSV
gradx   =   np.roll(difx,-1,axis=-1)
gradx   =   gradx-difx    # (D1)_{i\alpha} (D1)_{ij} x_j
grady   =   np.roll(dify,-1,axis=-2)  
grady   =   grady-dify    # (D2)_{i\alpha} (D2)_{ij} x_j
dalphaR[:,0,:,:]=(gradx+grady)*self.lbd2/2.

Pathwise Coordinate Descent

The starting point is set to $x_i^{(0)}=0$. We approach to the minimum of the loss function ($L(\vec{x})$) by iteratively updates $\vec{x}$. For each iteration, we find the coordinate on which the projection of derivative has the largest signal to noise ratio. This coordinate is recorded as $x_{\alpha}$ and the largest signal to noise ratio is recorded as $\rm{S/N}_{\alpha}$. We update $\lambda=\rm{max}(0.98 \rm{S/N}_{\alpha}, \lambda_1)$. Then we update along this specific coordinate $x_{\alpha}$.

$$x_{\alpha}^{(n+1)}=S_{\lambda\sigma_{\alpha}}(x_{\alpha}^{(n)}-\frac{\partial_{\alpha}F(\vec{x})}{A_{i\alpha}A_{i\alpha}+4\lambda_2}),$$

$S_{\lambda}$ is the soft thresholding function defined as:

$$S_{\lambda} (x_{\alpha})=\rm{sign}(x_{\alpha}) \rm{max}(|x_{\alpha}| - \lambda,0)$$
self.dalphaR =   -self.mu*self.gradient().real
# Update thresholds
snrArray=np.abs(self.dalphaR)/self.sigmaA
ind=np.unravel_index(np.argmax(snrArray, axis=None), snrArray.shape)
if snrArray[ind]<self.lbd:
    self.lbd_path[irun:]=self.lbd
    return
elif snrArray[ind]>1.02*self.lbd_path[irun-1]:
    # do not update threshold if the maximum is larger than lbd
    # record it to lbd_path
    self.lbd_path[irun]=self.lbd_path[irun-1]
else:
    # decrease threshold
    self.lbd_path[irun]=max(snrArray[ind]*0.98,self.lbd)
    self.thresholds=self.lbd_path[irun]*self.sigmaA
dum     =   self.alphaR+self.dalphaR
if threM=='ST':
    self.alphaR[ind]  = soft_thresholding(dum,self.thresholds)[ind]
elif threM=='FT':
    self.alphaR[ind]  = firm_thresholding(dum,self.thresholds)[ind]

Note that here $A_{i\alpha} A_{i\alpha}$ should be estimated taking into account the mask of observed shear field.

# sparseBase.massmapSparsityTask.spectrum_norm
# Estimate A_{i\alpha} A_{i\alpha}
asquareframe=np.zeros((self.nz,self.nframe,self.ny,self.nx))
for iz in range(self.nz):
    maskF=np.fft.fft2(self.mask[iz,:,:])
    for iframe in range(self.nframe):
        fun=np.abs(self.ks2D.transform(self.dict2D.fouaframes[iframe,:,:],outFou=False))**2.
        asquareframe[iz,iframe,:,:]=np.fft.ifft2(np.fft.fft2(fun).real*maskF).real

spectrum=np.sum(self.lensKernel[:,:,None,None,None]*asquareframe[:,None,:,:,:],axis=0)
self.mu=1./(spectrum+4.*self.lbd2)

FISTA Gradient Descent

The starting point is set to $x_i^{(0)}=0$. We approach to the minimum of the loss function ($L(\vec{x})$) by iteratively updates $\vec{x}$. For each iteration, we update along the gradient of $F(\vec{x})$ and each coordinate ($x_j$) of $\vec{x}$ is updated as follows:

$$x_{j}^{(n+1)}=S_{\lambda_1 \sigma_{j}}(x_{j}^{(n)}- \mu \partial_{j} F(\vec{x}) ),$$

$\mu$ is the step size of the iteration. $S_{\lambda}$ is the soft thresholding function

$$t^{(0)}=0$$$$t^{(n+1)}= \frac{1+\sqrt{1+4(t^{(n)})^2}}{2}$$$$\vec{x}^{(n+1)}=\vec{x}^{(n+1)}+\frac{t^{(n)} -1}{t^{(n+1)}} (\vec{x}^{(n+1)}-\vec{x}^{(n)})$$
dalphaR =   -self.mu*self.gradient().real
dum     =   self.alphaR.real+dalphaR
if threM=='FT':
    dum  = firm_thresholding(dum,self.thresholds)
elif threM=='ST':
    dum  = soft_thresholding(dum,self.thresholds)
#update x_\alpha according to FISTA
tnTmp= (1.+np.sqrt(1.+4*tn**2.))/2.
ratio= (tn-1.)/tnTmp
tn   = tnTmp
self.alphaR=dum+(ratio*(dum-self.alphaR))

Note that the step size $\mu$ should be less or equal to the reciprocal of the largest eigenvalue of the matrix $A_{ij}A_{jk}$. We estimate $\mu$ by simulating large number of random vectors and apply operator $A_{ij}A_{jk}$ to these random vecotrs

norm=0.
for irun in range(100):
    np.random.seed(irun)
    alphaTmp=   np.random.randn(self.nlp,self.nframe,self.ny,self.nx)
    normTmp =   np.sqrt(np.sum(alphaTmp**2.))
    alphaTmp=   alphaTmp/normTmp
    shearTmp=   self.main_forward(alphaTmp)
    alphaTmp2=  self.main_transpose(shearTmp)
    normTmp2=   np.sqrt(np.sum(alphaTmp2**2.))
    if normTmp2>norm:
        norm=normTmp2
self.mu    = 1./norm/1.1
/lustre2/work/xiangchong.li/massMapSim/stampSim/demo_sparseBase

A simple Simulation

$M_{200}=4.41 \times 10^{14} M_{\odot}$

$z=0.38$

$\sigma_\gamma=0.25$

$40$ galaxies per arcmin

Pixel frame only

Pixel frame + one halo frame

3D reconstruction

We should use the path-wise coordinate descent algorithm otherwise the reconstructed structures are significantly blurred in the line-of-sight direction.

The reason is that the lensing kernel for different lens planes are highly correlated so the basis vectors are highly nonorthogonal.

Lensing Kernels

Text(0.5, 0, '$z_s$')

Pathwise Coordniate Descent Reconstruction

FISTA Gradient Descent Reconstruction

lambda as function of iteration

Text(0, 0.5, '$\\lambda$')